Adaptive Data Partitioning Using Probability Distribution
نویسندگان
چکیده
Many computing problems benefit from dynamic data partitioning—dividing a large amount of data into smaller chunks with better locality. When data can be sorted, two methods are commonly used in partitioning. The first selects pivots, which enable balanced partitioning but cause a large overhead of up to half of the sorting time. The second method uses simple functions, which is fast but requires that the input data confirm to a uniform distribution. In this paper, we propose a new method, which partitions data using the cumulative distribution function. It partitions data of any distribution in linear time, independent to the number of sublists to be partitioned into. Experiments show 10-30% improvement in partitioning balance and 20-70% reduction in partitioning overhead. The new method is more scalable than existing methods. It yields greater benefit when the data set and the number of sub-lists grow larger. By applying this method, our sequential sorting beats Quick-sorting by 20% and parallel sorting exceeds the previous sorting algorithm by 33-50%.
منابع مشابه
Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques
Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...
متن کاملHydrograph Estimation based on Various Components of Rainfall Using Adaptive Neuro-Fuzzy Inference System in Kasilian Watershed
Flood hydrograph preparation and estimation are considered a comprehensive information for soil and water managers and planners. While it is not simply possible preparing it for all watersheds. Therfore suitable flood hydrograph estimation and modeling seems to be necessary using available rainfall data. The study area is located in Kasilian representative watershed in Mazandaran province compr...
متن کاملتخمین وفقی مرز کلاتر در کلاترهای ویبول با استفاده از پیش آشکارساز UMPI
In radar detection, the existence of the clutter edge in the reference samples considerably degrades the performance of the detector. Hence, clutter edge estimation not only improves the CFAR detectors, but also can be used for partitioning the various areas of the clutter in the clutter map. In this paper, we propose an adaptive algorithm for detecting the clutter edge between two Weibull clut...
متن کاملARMaDA: An Adaptive Application-sensitive Partitioning Framework for SAMR Applications
Distributed implementations of dynamic adaptive mesh refinement techniques offer the potential for accurate solutions of physically realistic models of complex physical phenomena. However, configuring and managing the execution of these applications presents significant challenges in resource allocation, data-distribution and loadbalancing, communication and coordination, and runtime management...
متن کاملAn Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set
Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003